O projeto tem por objetivo utilizar técnicas de análise de dados para limpar e posteriormente realizar uma análise exploratória de dados obtidos de uma única cidade na plataforma colaborativa OpenStreetMap.
Desse modo, escolhi a cidade de Brasília, capital federal, local onde vivo a pouco mais de 1 ano, para ser o objeto de estudo desse projeto.
import folium
print('A versão de Folium carrega é a',folium.__version__)
bsb = folium.Map([-15.7880, -47.8468], zoom_start=11)
bsb
import xml.etree.ElementTree as ET
from collections import Counter
import pandas as pd
osmfile = "C:\\Users\\jferraz\\Documents\\Pessoal\\Cursos\\Nanodegree\\Analista de Dados\\P3\\dados projeto\\brasilia_brazil.osm"
Iniciaremos explorando os atributos associados aos nodes dos dados. Em seguida construíremos um dicionário que receberá todas as informações relativas a cada node.
count_node = 0
for event, elem in ET.iterparse(osmfile, events=("start",)):
if elem.tag == "node":
if count_node > 5:
break
for tag in elem.iter():
count_node += 1
print(tag.attrib)
A descrição do significado de cada atributo do node pode ser encontrada na página Wiki do OpenStreetMap. Em seguida, iremos consolidar os dados dos nodes em uma lista de dicionários.
node_list = list() #lista de dicionários
node_dict = {} #dicionário que receberá a informação de cada node
attrib_nodes = ('id' and 'changeset' and 'user' and 'uid' and 'version' and 'timestamp' and 'lon' and 'lat') #atributos dos nodes
#PROCESSO ITERATIVO QUE IRÁ PERCORRER OS NODES E CONSTRUIR DICIONÁRIOS COM AS INFORMAÇÕES RELATIVA A ELES E POSTERIORMENTE
#ADICIONÁ-LO A LISTA.
for event, elem in ET.iterparse(osmfile, events=("start",)):
if elem.tag == "node":
for tag in elem.iter():
if len(node_dict) == 0 and attrib_nodes in tag.attrib:
node_dict['id'] = tag.attrib['id']
node_dict['lat'] = tag.attrib['lat']
node_dict['lon'] = tag.attrib['lon']
node_dict['user'] = tag.attrib['user']
node_dict['uid'] = tag.attrib['uid']
node_dict['version'] = tag.attrib['version']
node_dict['changeset'] = tag.attrib['changeset']
node_dict['timestamp'] = tag.attrib['timestamp']
node_list.append(node_dict)
elif len(node_dict) != 0 and attrib_nodes in tag.attrib:
node_dict = {}
node_dict['id'] = tag.attrib['id']
node_dict['lat'] = tag.attrib['lat']
node_dict['lon'] = tag.attrib['lon']
node_dict['user'] = tag.attrib['user']
node_dict['uid'] = tag.attrib['uid']
node_dict['version'] = tag.attrib['version']
node_dict['changeset'] = tag.attrib['changeset']
node_dict['timestamp'] = tag.attrib['timestamp']
node_list.append(node_dict)
node_list[0:2] #AMOSTRA DOS DADOS CONTIDOS NA LISTA DE DICIONÁRIOS DE NODES
Em seguida, podemos explorar os atributos associados as tags vinculados aos nodes. Em seguida construíremos um dicionário que receberá todas as informações relativas a cada tag.
count_tag = 0
for event, elem in ET.iterparse(osmfile, events=("start",)):
if elem.tag == "node":
for node in elem.iter('node'):
for tag in elem.iter('tag'):
if count_tag > 5:
break
count_tag += 1
print(tag.attrib)
Cada tag designa um atributo geográfico a uma característica apresentada por um node, way ou relation específico. Para maiores detalhes e exemplos de diferentes tags, podem ser consultados pelo seguinte link. De forma similiar, iremos consolidar os dados das tags dos nodes em uma lista de dicionário.
node_tags_dict = {} #dicionário com os dados individuais das tags do nodes
node_tags_list = list() #lista de dicionários
for event, elem in ET.iterparse(osmfile, events=("start",)):
if elem.tag == "node":
for node in elem.iter('node'):
for tag in elem.iter('tag'):
if len(node_tags_dict) == 0:
node_tags_dict['key'] = tag.attrib['k']
node_tags_dict['value'] = tag.attrib['v']
node_tags_dict['node_id'] = node.attrib['id']
node_tags_list.append(node_tags_dict)
elif len (node_tags_dict) != 0:
node_tags_dict = {}
node_tags_dict['key'] = tag.attrib['k']
node_tags_dict['value'] = tag.attrib['v']
node_tags_dict['node_id'] = node.attrib['id']
node_tags_list.append(node_tags_dict)
node_tags_list[0:2] #amostra da lista de dicionários
Por conseguinte, iremos explorar os atributos associados as ways. Para maiores detalhes sobre essas unidades, clique aqui. Em seguida construíremos um dicionário que receberá todas as informações relativas a cada way.
count_way = 0
for event, elem in ET.iterparse(osmfile, events=("start",)):
if elem.tag == "way":
for tag in elem.iter('way'):
if count_way > 5:
break
count_way += 1
print(tag.attrib)
ways_list = list() #lista de dicionários com dados de todos os ways
ways_dict = {} #dicionários que receberam os dados relativos a cada way individualmente
attrib_ways = ('id' and 'changeset' and 'user' and 'uid' and 'version' and 'timestamp') #atributos vinculados as ways
for event, elem in ET.iterparse(osmfile, events=("start",)):
if elem.tag == "way":
for tag in elem.iter():
if len(ways_dict) == 0 and attrib_ways in tag.attrib:
ways_dict['id'] = tag.attrib['id']
ways_dict['user'] = tag.attrib['user']
ways_dict['uid'] = tag.attrib['uid']
ways_dict['version'] = tag.attrib['version']
ways_dict['changeset'] = tag.attrib['changeset']
ways_dict['timestamp'] = tag.attrib['timestamp']
ways_list.append(ways_dict)
elif len(ways_dict) != 0 and attrib_ways in tag.attrib:
ways_dict = {}
ways_dict['id'] = tag.attrib['id']
ways_dict['user'] = tag.attrib['user']
ways_dict['uid'] = tag.attrib['uid']
ways_dict['version'] = tag.attrib['version']
ways_dict['changeset'] = tag.attrib['changeset']
ways_dict['timestamp'] = tag.attrib['timestamp']
ways_list.append(ways_dict)
ways_list[0:2] #amostra da lista de dicionários relativas aos ways
Em seguida, vamos verificar os atributos associados as tags dos ways. Em seguida construíremos uma lista de dicionários que receberá todas as informações relativas a cada tag vinculada aos ways.
count_ways_tag = 0
for event, elem in ET.iterparse(osmfile, events=("start",)):
if elem.tag == "way":
for way in elem.iter('way'):
for tag in elem.iter('tag'):
if count_ways_tag > 5:
break
count_ways_tag += 1
print(tag.attrib)
ways_tags_dict = {} #dicionário para receber os atributos das tags associados aos ways
ways_tags_list = list() #lista de dicionários
for event, elem in ET.iterparse(osmfile, events=("start",)):
if elem.tag == "way":
for way in elem.iter('way'):
for tag in elem.iter('tag'):
if len(ways_tags_dict) == 0:
if 'k' in tag.attrib:
ways_tags_dict['key'] = tag.attrib['k']
if 'v' in tag.attrib:
ways_tags_dict['value'] = tag.attrib['v']
ways_tags_dict['way_id'] = way.attrib['id']
ways_tags_list.append(ways_tags_dict)
elif len (ways_tags_dict) != 0:
ways_tags_dict = {}
if 'k' in tag.attrib:
ways_tags_dict['key'] = tag.attrib['k']
if 'v' in tag.attrib:
ways_tags_dict['value'] = tag.attrib['v']
ways_tags_dict['way_id'] = way.attrib['id']
ways_tags_list.append(ways_tags_dict)
ways_tags_list[0:2] #uma amostra da lista de dicionário
Por conseguinte, vamos verificar os atributos associados aos nodes dos ways. Posteriomente, construíremos uma lista de dicionários que receberá todas as informações relativas a cada node vinculado aos ways.
count_ways_nodes = 0
for event, elem in ET.iterparse(osmfile, events=("start",)):
if elem.tag == "way":
for way in elem.iter('way'):
for nd in elem.iter('nd'):
if count_ways_nodes > 5:
break
count_ways_nodes += 1
print(nd.attrib)
ways_nodes_dict = {} #dicionário para receber os atributos das tags associados aos ways
ways_nodes_list = list() #lista de dicionários
for event, elem in ET.iterparse(osmfile, events=("start",)):
if elem.tag == "way":
for way in elem.iter('way'):
for nd in elem.iter('nd'):
if len(ways_nodes_dict) == 0:
ways_nodes_dict['id'] = way.attrib['id']
ways_nodes_dict['node_id'] = nd.attrib['ref']
ways_nodes_list.append(ways_nodes_dict)
elif len (ways_nodes_dict) != 0:
ways_nodes_dict = {}
ways_nodes_dict['id'] = way.attrib['id']
ways_nodes_dict['node_id'] = nd.attrib['ref']
ways_nodes_list.append(ways_nodes_dict)
ways_nodes_list[0:10]
A cidade de Brasília foi uma cidade planejada e possui características únicas. Para os menos familiarizados, os endereços da cidade são um verdadeiro desafio para se compreender em um primeiro contato. Abaixo vamos verificar os mais diversos tipos e analisar se seguem um padrão.
addr_count = 0
for tag in node_tags_list:
if tag['key'] == 'addr:street':
if addr_count > 30:
break
addr_count += 1
print(tag['value'])
Na lista acima é possível verificar que há diversos endereços que usam abreviações para Conjunto, Quadra, Condomínio, Chácara e assim por diante. Assim, iremos padronizar diversos nomes que compõem o nome dos endereços.
#Criar um dicionário que irá receber um conjunto de chaves/valores que serão utilizados para correção dos endereços.
map_stnames = {'Conj.': 'Conjunto',
'CONJUNTO': 'Conjunto',
'Q.': 'Quadra',
'QUADRA': 'Quadra',
'Cln': 'CLN',
'Qnm': 'QNM',
'BLOCO': 'Bloco',
'Cond.': 'Condomínio',
'Qnl': 'QNL',
'Scln': 'SCLN',
'Cls': 'CLS',
'Qn': 'QN',
'Qr': 'QR',
'Ch.': 'Chácara',
'BL.': 'Bloco',
'Av.': 'Avenida'
}
for k,v in map_stnames.items():
for i,tag in enumerate(node_tags_list):
if tag['key'] == 'addr:street':
if k in tag['value']:
location = tag['value'].index(k) #determinar o index onde a key de map_stnames se encontra na string
if location != 0: #caso a sub-string não esteja no início da string
comp_nrua = len(k)
inicio_string = tag['value'][:location]
final_string = tag['value'][(location+comp_nrua):]
node_tags_list[i]['value'] = inicio_string+v+final_string #substituição na lista que guarda o nome das ruas
else: #caso a substring esteja no início da string
node_tags_list[i]['value'] = v+tag['value'][len(k):]
#Verificar se após a padronização ainda existem endereços com "QUADRA", "Av.", "Cond.", por exemplo.
for tag in node_tags_list:
if tag['key'] == 'addr:street':
if 'QUADRA' in tag['value']:
print(tag['value'])
if 'Av.' in tag['value']:
print(tag['value'])
if 'Cond.' in tag['value']:
print(tag['value'])
Pelo código acima, é possível inferir que a substituição foi bem sucedida.
Analisar o padrão que os números dos CEP's estão representados no conjunto de dados.
for tag1,tag2 in zip(node_tags_list,ways_tags_list):
if tag1['key'] == 'addr:postcode':
print(tag1['value'])
if tag2['key'] == 'addr:postcode':
print(tag2['value'])
Com os dados acima, pode-se verificar uma falta de padronização na representação do CEP, alguns utilizam hífen como separador dos três último algarismos, outro não. Alguns registros possuem sepador de milhar, enquanto outros omitem. Dessa forma é necessário uma padronização da representação.
#PADRONIZAR OS CEP's
for i,tag in enumerate(node_tags_list):
if tag['key'] == 'addr:postcode':
if "." in tag['value']:
node_tags_list[i]['value'] = tag['value'].replace(".","") #retirada do ponto como separador de milhar
elif "-" not in tag['value']:
node_tags_list[i]['value'] = tag['value'][:5]+"-"+tag['value'][5:] #inseração do '-'
for i,tag in enumerate(ways_tags_list):
if tag['key'] == 'addr:postcode':
if "." in tag['value']:
ways_tags_list[i]['value'] = tag['value'].replace(".","") #retirada do ponto como separador de milhar
elif "-" not in tag['value']:
ways_tags_list[i]['value'] = tag['value'][:5]+"-"+tag['value'][5:] #inseração do '-'
#Repetição do código para confirmar a padronização da representação númerica dos CEP's
count_cep = 0
for tag1,tag2 in zip(node_tags_list,ways_tags_list):
if count_cep > 20:
break
if tag1['key'] == 'addr:postcode':
count_cep += 1
print(tag1['value'])
if tag2['key'] == 'addr:postcode':
count_cep += 1
print(tag2['value'])
Os dados acima trabalhados irão alimentar um banco de dados. Para isso, serão criados dataframes, que por sua vez irão ser utilizados para alimentar o banco de dados.
#CRIAR UM DATAFRAME PARA NODES
nodes = pd.DataFrame(columns=['id','lat','lon','user','uid','version','changeset','timestamp'])
nodes['id'] = [node['id'] for node in node_list]
nodes.lat = [node['lat'] for node in node_list]
nodes.lon = [node['lon'] for node in node_list]
nodes.user = [node['user'] for node in node_list]
nodes.uid = [node['uid'] for node in node_list]
nodes.version = [node['version'] for node in node_list]
nodes.changeset = [node['changeset'] for node in node_list]
nodes.timestamp = [node['timestamp'] for node in node_list]
nodes['timestamp'] = pd.to_datetime(nodes['timestamp']) #conversão para datetime
nodes.head()
#CRIAR UM DATAFRAME PARA NODE'S TAGS
node_tags = pd.DataFrame(columns=['key','value','node_id'])
node_tags.key = [node['key'] for node in node_tags_list]
node_tags.value = [node['value'] for node in node_tags_list]
node_tags.node_id = [node['node_id'] for node in node_tags_list]
node_tags.head()
#CRIAR UM DATAFRAME PARA WAYS
ways = pd.DataFrame(columns=['id','user','uid','version','changeset','timestamp'])
ways.id = [way['id'] for way in ways_list]
ways.user = [way['user'] for way in ways_list]
ways.uid = [way['uid'] for way in ways_list]
ways.version = [way['version'] for way in ways_list]
ways.changeset = [way['changeset'] for way in ways_list]
ways.timestamp = [way['timestamp'] for way in ways_list]
ways['timestamp'] = pd.to_datetime(ways['timestamp']) #conversão para datetime
ways.head()
#CRIAR UM DATAFRAME PARA WAY'S TAGS
ways_tags = pd.DataFrame(columns=['key','value','way_id'])
ways_tags.key = [tag['key'] for tag in ways_tags_list]
ways_tags.value = [tag['value'] for tag in ways_tags_list]
ways_tags.way_id = [tag['way_id'] for tag in ways_tags_list]
ways_tags.head()
#CRIAR UM DATAFRAME PARA WAY'S NODES
ways_nodes = pd.DataFrame(columns=['id','node_id'])
ways_nodes['id'] = [node['id'] for node in ways_nodes_list]
ways_nodes.node_id = [node['node_id'] for node in ways_nodes_list]
ways_nodes.head()
import sqlite3
conn = sqlite3.connect('ud_bsb.db')
c = conn.cursor()
# Criação da tabela NODES no banco
c.execute('''CREATE TABLE nodes
(id INTEGER PRIMARY KEY,
lat REAL,
lon REAL,
user TEXT,
uid INTEGER,
version INTEGER,
changeset INTEGER,
timestamp TEXT
)''')
#Alimentação da tabela nodes do bd com o dataframe correspondente;
nodes.to_sql('nodes', conn, if_exists='append', index=False)
# Criação da tabela NODE TAGS no banco
c.execute('''CREATE TABLE nodes_tags
(id INTEGER PRIMARY KEY AUTOINCREMENT,
key TEXT,
value TEXT,
node_id INTEGER,
FOREIGN KEY(node_id) REFERENCES nodes(id)
)''')
#Alimentação da tabela nodes_tags do bd com o dataframe correspondente;
node_tags.to_sql('nodes_tags', conn, if_exists='append', index=False)
# Criação da tabela WAYS no banco
c.execute('''CREATE TABLE ways
(id INTEGER PRIMARY KEY,
user TEXT,
uid INTEGER,
version INTEGER,
changeset INTEGER,
timestamp TEXT
)''')
#Alimentação da tabela ways do bd com o dataframe correspondente;
ways.to_sql('ways', conn, if_exists='append', index=False)
# CRIAR TABELA WAYS_TAGS no banco
c.execute('''CREATE TABLE ways_tags (
key TEXT NOT NULL,
value TEXT NOT NULL,
way_id INTEGER NOT NULL,
FOREIGN KEY (way_id) REFERENCES ways(id)
);''')
#Alimentação da tabela ways_tags do bd com o dataframe correspondente;
ways_tags.to_sql('ways_tags', conn, if_exists='append', index=False)
# CRIAR TABELA WAYS_NODES no banco
c.execute('''CREATE TABLE ways_nodes (
id INTEGER NOT NULL,
node_id INTEGER NOT NULL,
FOREIGN KEY (id) REFERENCES ways(id),
FOREIGN KEY (node_id) REFERENCES nodes(id)
);''')
#Alimentação da tabela ways_nodes do bd com o dataframe correspondente;
ways_nodes.to_sql('ways_nodes', conn, if_exists='append', index=False)
#COMMIT DAS ALTERAÇÕES
conn.commit()
A capital federal, Brasília, é sede de vários órgãos públicos federais, além disso, é conhecida por ser o centro político do país. Em seguida, iremos levantar a localização dos órgãos públicos que estão mapeados na plataforma do OpenStreetMap.
#Aquisição dos atributos dos nodes que são ligadas a instalações prediais governamentais (KEY = OFFICE E VALUE = GOVERNMENT)
gov_data_loc = c.execute('''select key,value,nodes.lat,nodes.lon from nodes_tags
join nodes on nodes_tags.node_id = nodes.id where node_id in (SELECT nodes.id from nodes_tags join nodes
on nodes_tags.node_id = nodes.id where nodes_tags.key = 'office' and nodes_tags.value = 'government')''')
df_gov_loc = pd.DataFrame(gov_data_loc.fetchall(),columns=['key','value','lat','lon']) #carregar em um dataframe
df_gov_loc.head()
#DETERMINAR TODOS OS NODES ASSOCIADOS A WAYS QUE POSSUEM TAGS COM UMA KEY = OFFICE E VALUE GOVERNMENT
ways_gov_loc = c.execute('''SELECT key,value, nodes.lat, nodes.lon from ways_tags JOIN ways_nodes ON
ways_tags.way_id = ways_nodes.id JOIN nodes ON ways_nodes.node_id = nodes.id WHERE nodes.lat in
(SELECT nodes.lat from ways_tags JOIN ways_nodes ON ways_tags.way_id = ways_nodes.id JOIN nodes ON
ways_nodes.node_id = nodes.id WHERE ways_tags.value = 'government') AND nodes.lon in
(SELECT nodes.lon from ways_tags JOIN ways_nodes ON ways_tags.way_id = ways_nodes.id JOIN
nodes ON ways_nodes.node_id = nodes.id WHERE ways_tags.value = 'government')''')
#ARMAZENAR A QUERY EM UM DATAFRAME
way_gov_loc = pd.DataFrame(ways_gov_loc.fetchall(),columns=['key','value','lat','lon']) #carregar em um dataframe
way_gov_loc.head()
#CRIAR UM DATAFRAME PARA ARMAZENAR OS DADOS DE LOCALIZAÇÃO DOS ÓRGÃOES PÚBLICOS ARMAZENADOS EM df_gov_loc e way_gov_loc
loc_gov_ways_nodes = pd.DataFrame(columns=['nome','lat','lon']) #carregar em um dataframe
index_count_loc_gov = 0
for index,row in way_gov_loc.iterrows():
if row['key'] == 'name':
if row['value'] not in loc_gov_ways_nodes['nome'].values:
index_count_loc_gov += 1
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'nome',row['value'])
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'lat',row['lat'])
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'lon',row['lon'])
#AO TODO FORAM OBTIDOS A LOCALIZAÇÃO DE 107 ÓRGÃOS PÚBLICOS A PARTIR DOS DADOS ARMAZENADOS EM way_gov_loc
loc_gov_ways_nodes.shape
for index,row in df_gov_loc.iterrows():
if row['key'] == 'name':
if row['value'] not in loc_gov_ways_nodes['nome'].values:
index_count_loc_gov += 1
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'nome',row['value'])
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'lat',row['lat'])
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'lon',row['lon'])
#AO TODO FORAM OBTIDOS A LOCALIZAÇÃO DE 131 ÓRGÃOS PÚBLICOS A PARTIR DOS DADOS ARMAZENADOS EM df_gov_loc
loc_gov_ways_nodes.shape
map_gov = folium.Map([-15.7880, -47.8468], zoom_start=12)
for index,row in loc_gov_ways_nodes.iterrows():
folium.Marker([row['lat'], row['lon']], popup=row['nome'].encode("latin-1")).add_to(map_gov)
map_gov
Por meio de zoom na região da esplanada dos ministérios, é possível verificar que vários dos órgãos ali sediados não possuem indicação no mapa.
esplanada = folium.Map([-15.7996,-47.8748],zoom_start=15)
for index,row in loc_gov_ways_nodes.iterrows():
folium.Marker([row['lat'], row['lon']], popup=row['nome'].encode("latin-1")).add_to(esplanada)
esplanada
A página wiki do OpenStreetmap recomenada que para escritórios governamentais seja utilizada a tag office=government. Como explica o texto abaixo.
The tag office=government is used to tag government offices of a (supra)national, regional or local government agency or department. In these offices, staff work directly for the government and carry out tasks to administer facilities, operate registries and licensing bureaus, regulate lands and/or people, etc.
Infelizmente, não é isso que ocorre em todos os registros adicionados ao site, como bem mostra os extrato do código xml da cidade de Brasília.
Ministério dos Transportes
<node id="2897250180" lat="-15.7973305" lon="-47.8677576" version="2" timestamp="2014-10-25T01:27:41Z" changeset="26312704" uid="360183" user="wille">
<tag k="ref" v="10"/>
<tag k="name" v="Ministério dos Transportes"/>
<tag k="phone" v="+55 61 40039846"/>
<tag k="amenity" v="bicycle_rental"/>
<tag k="website" v="http://www.bikebrasilia.com"/>
<tag k="capacity" v="12"/>
<tag k="operator" v="Bike Brasília"/>
<tag k="opening_hours" v="Mo-Su 06:00-24:00"/>
Ministério da Cultura / Meio Ambiente
<way id="46114584" version="8" timestamp="2015-11-30T20:05:06Z" changeset="35673450" uid="3147342" user="jadson_reis">
<nd ref="587938477"/>
<nd ref="587938480"/>
<nd ref="3869733864"/>
<nd ref="587938476"/>
<nd ref="587938477"/>
<tag k="name" v="Ministério da Cultura / Meio Ambiente"/>
<tag k="building" v="yes"/>
<tag k="addr:street" v="Eixo Monumental"/>
<tag k="addr:housename" v="Bloco B"/>
<tag k="building:levels" v="11"/>
orgp_nodetag_extra = c.execute('''SELECT nodes_tags.value, nodes.lat, nodes.lon from nodes_tags
JOIN nodes ON nodes.id = nodes_tags.node_id where value like '%Ministério%' ''')
df_extra_opb_ntag = pd.DataFrame(orgp_nodetag_extra.fetchall(),columns=['value','lat','lon']) #carregar em um dataframe
df_extra_opb_ntag
df_extra_opb_ntag.drop(df_extra_opb_ntag.index[8], inplace=True) #EXCLUSÃO DA ENTRADA REFERENTE A INSTALAÇÃO PREDIAL RELIGIOSA
orgp_waytag_extra = c.execute('''SELECT value, nodes.lat, nodes.lon from ways_tags JOIN ways_nodes ON
ways_tags.way_id = ways_nodes.id JOIN nodes ON ways_nodes.node_id = nodes.id where value like '%Ministério%' ''')
df_extra_opb_wtag = pd.DataFrame(orgp_waytag_extra.fetchall(),columns=['value','lat','lon']) #carregar em um dataframe
df_extra_opb_wtag.head() #como um way é formado pela combinação de vários nodes, é natural haver múltiplos registros.
for index,row in df_extra_opb_ntag.iterrows():
if row['value'] not in loc_gov_ways_nodes['nome'].values:
index_count_loc_gov += 1
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'nome',row['value'])
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'lat',row['lat'])
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'lon',row['lon'])
#AO TODO FORAM OBTIDOS A LOCALIZAÇÃO DE 8 ÓRGÃOS PÚBLICOS A PARTIR DOS DADOS ARMAZENADOS EM df_extra_opb_ntag
loc_gov_ways_nodes.shape
for index,row in df_extra_opb_wtag.iterrows():
if row['value'] not in loc_gov_ways_nodes['nome'].values:
index_count_loc_gov += 1
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'nome',row['value'])
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'lat',row['lat'])
loc_gov_ways_nodes.set_value(index_count_loc_gov, 'lon',row['lon'])
#AO TODO FORAM OBTIDOS A LOCALIZAÇÃO DE 24 ÓRGÃOS PÚBLICOS A PARTIR DOS DADOS ARMAZENADOS EM df_extra_opb_wtag
loc_gov_ways_nodes.shape
loc_gov_ways_nodes.head()
mapa_bsb_orgaos = folium.Map([-15.7880, -47.8468], zoom_start=12)
for index,row in loc_gov_ways_nodes.iterrows():
folium.Marker([row['lat'], row['lon']], popup=row['nome'].encode("latin-1")).add_to(mapa_bsb_orgaos)
Por meio do mapa, abaixo foi possível verificar que a maioria dos Ministérios localizados na esplanada agora possuem marcadores associados a sua localização.
mapa_bsb_orgaos
Os dados de registros do nodes e ways possuem informações sobre horário e dia que as informações foram adicionadas a platafroma do OpenstreetMap. Dessa maneira, é possível explorar essas informações em busca de padrões de comportamento dos usuários, como os horários mais frequentes de uso, bem como os dias e/ou meses do ano.
nodehour00 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '00' AND strftime('%H',timestamp) < '01' group by user order by user desc;''')
df_nodeh00 = pd.DataFrame(nodehour00.fetchall(),columns=['user','count'])
nodehour01 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '01' AND strftime('%H',timestamp) < '02' group by user order by user desc;''')
df_nodeh01 = pd.DataFrame(nodehour01.fetchall(),columns=['user','count'])
nodehour02 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '02' AND strftime('%H',timestamp) < '03' group by user order by user desc;''')
df_nodeh02 = pd.DataFrame(nodehour02.fetchall(),columns=['user','count'])
nodehour03 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '03' AND strftime('%H',timestamp) < '04' group by user order by user desc;''')
df_nodeh03 = pd.DataFrame(nodehour03.fetchall(),columns=['user','count'])
nodehour04 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '04' AND strftime('%H',timestamp) < '05' group by user order by user desc;''')
df_nodeh04 = pd.DataFrame(nodehour04.fetchall(),columns=['user','count'])
nodehour05 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '05' AND strftime('%H',timestamp) < '06' group by user order by user desc;''')
df_nodeh05 = pd.DataFrame(nodehour05.fetchall(),columns=['user','count'])
nodehour06 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '06' AND strftime('%H',timestamp) < '07' group by user order by user desc;''')
df_nodeh06 = pd.DataFrame(nodehour06.fetchall(),columns=['user','count'])
nodehour07 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '07' AND strftime('%H',timestamp) < '08' group by user order by user desc;''')
df_nodeh07 = pd.DataFrame(nodehour07.fetchall(),columns=['user','count'])
nodehour08 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '08' AND strftime('%H',timestamp) < '09' group by user order by user desc;''')
df_nodeh08 = pd.DataFrame(nodehour08.fetchall(),columns=['user','count'])
nodehour09 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '09' AND strftime('%H',timestamp) < '10' group by user order by user desc;''')
df_nodeh09 = pd.DataFrame(nodehour09.fetchall(),columns=['user','count'])
nodehour10 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '10' AND strftime('%H',timestamp) < '11' group by user order by user desc;''')
df_nodeh10 = pd.DataFrame(nodehour10.fetchall(),columns=['user','count'])
nodehour11 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '11' AND strftime('%H',timestamp) < '12' group by user order by user desc;''')
df_nodeh11 = pd.DataFrame(nodehour11.fetchall(),columns=['user','count'])
nodehour12 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '12' AND strftime('%H',timestamp) < '13' group by user order by user desc;''')
df_nodeh12 = pd.DataFrame(nodehour12.fetchall(),columns=['user','count'])
nodehour13 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '13' AND strftime('%H',timestamp) < '14' group by user order by user desc;''')
df_nodeh13 = pd.DataFrame(nodehour13.fetchall(),columns=['user','count'])
nodehour14 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '14' AND strftime('%H',timestamp) < '15' group by user order by user desc;''')
df_nodeh14 = pd.DataFrame(nodehour14.fetchall(),columns=['user','count'])
nodehour15 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '15' AND strftime('%H',timestamp) < '16' group by user order by user desc;''')
df_nodeh15 = pd.DataFrame(nodehour15.fetchall(),columns=['user','count'])
nodehour16 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '16' AND strftime('%H',timestamp) < '17' group by user order by user desc;''')
df_nodeh16 = pd.DataFrame(nodehour16.fetchall(),columns=['user','count'])
nodehour17 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '17' AND strftime('%H',timestamp) < '18' group by user order by user desc;''')
df_nodeh17 = pd.DataFrame(nodehour17.fetchall(),columns=['user','count'])
nodehour18 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '18' AND strftime('%H',timestamp) < '19' group by user order by user desc;''')
df_nodeh18 = pd.DataFrame(nodehour18.fetchall(),columns=['user','count'])
nodehour19 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '19' AND strftime('%H',timestamp) < '20' group by user order by user desc;''')
df_nodeh19 = pd.DataFrame(nodehour19.fetchall(),columns=['user','count'])
nodehour20 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '20' AND strftime('%H',timestamp) < '21' group by user order by user desc;''')
df_nodeh20 = pd.DataFrame(nodehour20.fetchall(),columns=['user','count'])
nodehour21 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '21' AND strftime('%H',timestamp) < '22' group by user order by user desc;''')
df_nodeh21 = pd.DataFrame(nodehour21.fetchall(),columns=['user','count'])
nodehour22 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '22' AND strftime('%H',timestamp) < '23' group by user order by user desc;''')
df_nodeh22 = pd.DataFrame(nodehour22.fetchall(),columns=['user','count'])
nodehour23 = c.execute('''SELECT user, count(user) as user FROM nodes
WHERE strftime('%H',timestamp) >= '23' AND strftime('%H',timestamp) < '24' group by user order by user desc;''')
df_nodeh23 = pd.DataFrame(nodehour23.fetchall(),columns=['user','count'])
wayhour00 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '00' AND strftime('%H',timestamp) < '01' group by user order by user desc;''')
df_wayh00 = pd.DataFrame(nodehour00.fetchall(),columns=['user','count'])
wayhour01 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '01' AND strftime('%H',timestamp) < '02' group by user order by user desc;''')
df_wayh01 = pd.DataFrame(wayhour01.fetchall(),columns=['user','count'])
wayhour02 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '02' AND strftime('%H',timestamp) < '03' group by user order by user desc;''')
df_wayh02 = pd.DataFrame(wayhour02.fetchall(),columns=['user','count'])
wayhour03 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '03' AND strftime('%H',timestamp) < '04' group by user order by user desc;''')
df_wayh03 = pd.DataFrame(wayhour03.fetchall(),columns=['user','count'])
wayhour04 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '04' AND strftime('%H',timestamp) < '05' group by user order by user desc;''')
df_wayh04 = pd.DataFrame(wayhour04.fetchall(),columns=['user','count'])
wayhour05 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '05' AND strftime('%H',timestamp) < '06' group by user order by user desc;''')
df_wayh05 = pd.DataFrame(wayhour05.fetchall(),columns=['user','count'])
wayhour06 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '06' AND strftime('%H',timestamp) < '07' group by user order by user desc;''')
df_wayh06 = pd.DataFrame(wayhour06.fetchall(),columns=['user','count'])
wayhour07 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '07' AND strftime('%H',timestamp) < '08' group by user order by user desc;''')
df_wayh07 = pd.DataFrame(wayhour07.fetchall(),columns=['user','count'])
wayhour08 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '08' AND strftime('%H',timestamp) < '09' group by user order by user desc;''')
df_wayh08 = pd.DataFrame(wayhour08.fetchall(),columns=['user','count'])
wayhour09 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '09' AND strftime('%H',timestamp) < '10' group by user order by user desc;''')
df_wayh09 = pd.DataFrame(wayhour09.fetchall(),columns=['user','count'])
wayhour10 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '10' AND strftime('%H',timestamp) < '11' group by user order by user desc;''')
df_wayh10 = pd.DataFrame(wayhour10.fetchall(),columns=['user','count'])
wayhour11 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '11' AND strftime('%H',timestamp) < '12' group by user order by user desc;''')
df_wayh11 = pd.DataFrame(wayhour11.fetchall(),columns=['user','count'])
wayhour12 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '12' AND strftime('%H',timestamp) < '13' group by user order by user desc;''')
df_wayh12 = pd.DataFrame(wayhour12.fetchall(),columns=['user','count'])
wayhour13 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '13' AND strftime('%H',timestamp) < '14' group by user order by user desc;''')
df_wayh13 = pd.DataFrame(wayhour13.fetchall(),columns=['user','count'])
wayhour14 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '14' AND strftime('%H',timestamp) < '15' group by user order by user desc;''')
df_wayh14 = pd.DataFrame(wayhour14.fetchall(),columns=['user','count'])
wayhour15 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '15' AND strftime('%H',timestamp) < '16' group by user order by user desc;''')
df_wayh15 = pd.DataFrame(wayhour15.fetchall(),columns=['user','count'])
wayhour16 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '16' AND strftime('%H',timestamp) < '17' group by user order by user desc;''')
df_wayh16 = pd.DataFrame(wayhour16.fetchall(),columns=['user','count'])
wayhour17 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '17' AND strftime('%H',timestamp) < '18' group by user order by user desc;''')
df_wayh17 = pd.DataFrame(wayhour17.fetchall(),columns=['user','count'])
wayhour18 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '18' AND strftime('%H',timestamp) < '19' group by user order by user desc;''')
df_wayh18 = pd.DataFrame(wayhour18.fetchall(),columns=['user','count'])
wayhour19 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '19' AND strftime('%H',timestamp) < '20' group by user order by user desc;''')
df_wayh19 = pd.DataFrame(wayhour19.fetchall(),columns=['user','count'])
wayhour20 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '20' AND strftime('%H',timestamp) < '21' group by user order by user desc;''')
df_wayh20 = pd.DataFrame(wayhour20.fetchall(),columns=['user','count'])
wayhour21 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '21' AND strftime('%H',timestamp) < '22' group by user order by user desc;''')
df_wayh21 = pd.DataFrame(wayhour21.fetchall(),columns=['user','count'])
wayhour22 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '22' AND strftime('%H',timestamp) < '23' group by user order by user desc;''')
df_wayh22 = pd.DataFrame(wayhour22.fetchall(),columns=['user','count'])
wayhour23 = c.execute('''SELECT user, count(user) as user FROM ways
WHERE strftime('%H',timestamp) >= '23' AND strftime('%H',timestamp) < '24' group by user order by user desc;''')
df_wayh23 = pd.DataFrame(wayhour23.fetchall(),columns=['user','count'])
#COMPILAR O TOTAL DAS CONSTRIBUIÇÕES REALIZADAS EM NODES E WAYS EM UM DATAFRAME
str_hrs = ['00:00','01:00','02:00','03:00','04:00','05:00','06:00','07:00','08:00','09:00','10:00','11:00','12:00',
'13:00','14:00','15:00','16:00','17:00','18:00','19:00','20:00','21:00','22:00','23:00']
count_hrs = [df_nodeh00['count'].sum()+df_wayh00['count'].sum(),df_nodeh01['count'].sum()+df_wayh01['count'].sum(),
df_nodeh02['count'].sum()+df_wayh02['count'].sum(),df_nodeh03['count'].sum()+df_wayh03['count'].sum(),
df_nodeh04['count'].sum()+df_wayh04['count'].sum(),df_nodeh05['count'].sum()+df_wayh05['count'].sum(),
df_nodeh06['count'].sum()+df_wayh06['count'].sum(),df_nodeh07['count'].sum()+df_wayh07['count'].sum(),
df_nodeh08['count'].sum()+df_wayh08['count'].sum(),df_nodeh09['count'].sum()+df_wayh09['count'].sum(),
df_nodeh10['count'].sum()+df_wayh10['count'].sum(),df_nodeh11['count'].sum()+df_wayh11['count'].sum(),
df_nodeh12['count'].sum()+df_wayh12['count'].sum(),df_nodeh13['count'].sum()+df_wayh13['count'].sum(),
df_nodeh14['count'].sum()+df_wayh14['count'].sum(),df_nodeh15['count'].sum()+df_wayh15['count'].sum(),
df_nodeh16['count'].sum()+df_wayh16['count'].sum(),df_nodeh17['count'].sum()+df_wayh17['count'].sum(),
df_nodeh18['count'].sum()+df_wayh18['count'].sum(),df_nodeh19['count'].sum()+df_wayh19['count'].sum(),
df_nodeh20['count'].sum()+df_wayh20['count'].sum(),df_nodeh21['count'].sum()+df_wayh21['count'].sum(),
df_nodeh22['count'].sum()+df_wayh22['count'].sum(),df_nodeh23['count'].sum()+df_wayh23['count'].sum()]
df_count_hrs = pd.DataFrame(count_hrs, columns=['total'],index=str_hrs)
Pelo gráfico a seguir é possível, verificar o intervalo de 2 até as 9 da manhã é quando as pessoas menos colaboram em Brasília com a plataforma do OpenstreetMap. Entre meio-dia (12:00) e as 20 horas encontra-se a maior parte das constribuições do usuários, aproximadamente 55% delas. Como mostra o cálculo abaixo:
#Cálculo do percentual das contribuições entre 12 e 20 horas.
df_count_hrs['12:00' : "20:00"].sum()/df_count_hrs.sum()
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_context("talk", rc={"lines.linewidth": 2})
user_hour = sns.barplot(x=df_count_hrs.index, y="total", data=df_count_hrs, palette="Blues_d")
plt.xticks(rotation=45)
user_hour.set_ylabel("Número de Constribuições")
Depois de constastar os horários de mais frequentes de uso da plataforma pelo usuários, agora veremos se existem padrões em relações ao meses do ano. Iremos começar determinando as datas mais antiga e a mais recente dos dados que estamos analisando.
# data mais antiga e e mais recente nos dados dos nodes
c.execute("SELECT min(timestamp), max(timestamp) from nodes;")
c.fetchall()
# data mais antiga e e mais recente nos dados dos ways
c.execute("SELECT min(timestamp), max(timestamp) from ways;")
c.fetchall()
#2007
jan_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
fev_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
marc_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
abr_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
mai_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
jun_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
jul_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
ago_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
set_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
out_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
nov_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
dez_2007 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
months_2007 = pd.DataFrame([jan_2007,fev_2007,marc_2007,abr_2007,mai_2007,jun_2007,jul_2007,
ago_2007,set_2007,out_2007,nov_2007,dez_2007], columns=['Total'])
#2008
jan_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
fev_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
marc_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
abr_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
mai_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
jun_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
jul_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
ago_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
set_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
out_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
nov_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
dez_2008 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
months_2008 = pd.DataFrame([jan_2008,fev_2008,marc_2008,abr_2008,mai_2008,jun_2008,jul_2008,
ago_2008,set_2008,out_2008,nov_2008,dez_2008], columns=['Total'])
#2009
jan_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
fev_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
marc_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
abr_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
mai_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
jun_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
jul_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
ago_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
set_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
out_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
nov_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
dez_2009 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
months_2009 = pd.DataFrame([jan_2009,fev_2009,marc_2009,abr_2009,mai_2009,jun_2009,jul_2009,
ago_2009,set_2009,out_2009,nov_2009,dez_2009], columns=['Total'])
#2010
jan_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
fev_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
marc_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
abr_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
mai_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
jun_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
jul_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
ago_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
set_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
out_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
nov_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
dez_2010 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
months_2010 = pd.DataFrame([jan_2010,fev_2010,marc_2010,abr_2010,mai_2010,jun_2010,jul_2010,
ago_2010,set_2010,out_2010,nov_2010,dez_2010], columns=['Total'])
#2011
jan_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
fev_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
marc_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
abr_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
mai_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
jun_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
jul_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
ago_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
set_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
out_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
nov_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
dez_2011 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
months_2011 = pd.DataFrame([jan_2011,fev_2011,marc_2011,abr_2011,mai_2011,jun_2011,jul_2011,
ago_2011,set_2011,out_2011,nov_2011,dez_2011], columns=['Total'])
#2012
jan_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
fev_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
marc_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
abr_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
mai_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
jun_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
jul_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
ago_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
set_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
out_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
nov_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
dez_2012 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
months_2012 = pd.DataFrame([jan_2012,fev_2012,marc_2012,abr_2012,mai_2012,jun_2012,jul_2012,
ago_2012,set_2012,out_2012,nov_2012,dez_2012], columns=['Total'])
#2013
jan_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
fev_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
marc_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
abr_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
mai_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
jun_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
jul_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
ago_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
set_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
out_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
nov_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
dez_2013 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
months_2013 = pd.DataFrame([jan_2013,fev_2013,marc_2013,abr_2013,mai_2013,jun_2013,jul_2013,
ago_2013,set_2013,out_2013,nov_2013,dez_2013], columns=['Total'])
#2014
jan_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
fev_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
marc_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
abr_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
mai_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
jun_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
jul_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
ago_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
set_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
out_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
nov_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
dez_2014 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
months_2014 = pd.DataFrame([jan_2014,fev_2014,marc_2014,abr_2014,mai_2014,jun_2014,jul_2014,
ago_2014,set_2014,out_2014,nov_2014,dez_2014], columns=['Total'])
#2015
jan_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
fev_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
marc_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
abr_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
mai_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
jun_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
jul_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
ago_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
set_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
out_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
nov_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
dez_2015 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
months_2015 = pd.DataFrame([jan_2015,fev_2015,marc_2015,abr_2015,mai_2015,jun_2015,jul_2015,
ago_2015,set_2015,out_2015,nov_2015,dez_2015], columns=['Total'])
#2016
jan_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
fev_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
marc_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
abr_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
mai_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
jun_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
jul_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
ago_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
set_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
out_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
nov_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
dez_2016 = c.execute('''SELECT count(timestamp) FROM nodes WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
months_2016 = pd.DataFrame([jan_2016,fev_2016,marc_2016,abr_2016,mai_2016,jun_2016,jul_2016,
ago_2016,set_2016,out_2016,nov_2016,dez_2016], columns=['Total'])
#2007
janw_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
fevw_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
marcw_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
abrw_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
maiw_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
junw_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
julw_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
agow_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
setw_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
outw_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
novw_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
dezw_2007 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2007';''').fetchall()[0][0]
months_w2007 = pd.DataFrame([janw_2007,fevw_2007,marcw_2007,abrw_2007,maiw_2007,junw_2007,julw_2007,
agow_2007,setw_2007,outw_2007,novw_2007,dezw_2007], columns=['Total'])
#2008
janw_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
fevw_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
marcw_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
abrw_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
maiw_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
junw_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
julw_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
agow_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
setw_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
outw_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
novw_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
dezw_2008 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2008';''').fetchall()[0][0]
months_w2008 = pd.DataFrame([janw_2008,fevw_2008,marcw_2008,abrw_2008,maiw_2008,junw_2008,julw_2008,
agow_2008,setw_2008,outw_2008,novw_2008,dezw_2008], columns=['Total'])
#2009
janw_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
fevw_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
marcw_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
abrw_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
maiw_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
junw_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
julw_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
agow_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
setw_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
outw_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
novw_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
dezw_2009 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2009';''').fetchall()[0][0]
months_w2009 = pd.DataFrame([janw_2009,fevw_2009,marcw_2009,abrw_2009,maiw_2009,junw_2009,julw_2009,
agow_2009,setw_2009,outw_2009,novw_2009,dezw_2009], columns=['Total'])
#2010
janw_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
fevw_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
marcw_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
abrw_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
maiw_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
junw_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
julw_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
agow_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
setw_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
outw_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
novw_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
dezw_2010 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2010';''').fetchall()[0][0]
months_w2010 = pd.DataFrame([janw_2010,fevw_2010,marcw_2010,abrw_2010,maiw_2010,junw_2010,julw_2010,
agow_2010,setw_2010,outw_2010,novw_2010,dezw_2010], columns=['Total'])
#2011
janw_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
fevw_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
marcw_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
abrw_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
maiw_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
junw_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
julw_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
agow_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
setw_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
outw_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
novw_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
dezw_2011 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2011';''').fetchall()[0][0]
months_w2011 = pd.DataFrame([janw_2011,fevw_2011,marcw_2011,abrw_2011,maiw_2011,junw_2011,julw_2011,
agow_2011,setw_2011,outw_2011,novw_2011,dezw_2011], columns=['Total'])
#2012
janw_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
fevw_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
marcw_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
abrw_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
maiw_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
junw_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
julw_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
agow_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
setw_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
outw_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
novw_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
dezw_2012 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2012';''').fetchall()[0][0]
months_w2012 = pd.DataFrame([janw_2012,fevw_2012,marcw_2012,abrw_2012,maiw_2012,junw_2012,julw_2012,
agow_2012,setw_2012,outw_2012,novw_2012,dezw_2012], columns=['Total'])
#2013
janw_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
fevw_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
marcw_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
abrw_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
maiw_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
junw_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
julw_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
agow_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
setw_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
outw_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
novw_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
dezw_2013 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2013';''').fetchall()[0][0]
months_w2013 = pd.DataFrame([janw_2013,fevw_2013,marcw_2013,abrw_2013,maiw_2013,junw_2013,julw_2013,
agow_2013,setw_2013,outw_2013,novw_2013,dezw_2013], columns=['Total'])
#2014
janw_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
fevw_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
marcw_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
abrw_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
maiw_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
junw_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
julw_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
agow_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
setw_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
outw_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
novw_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
dezw_2014 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2014';''').fetchall()[0][0]
months_w2014 = pd.DataFrame([janw_2014,fevw_2014,marcw_2014,abrw_2014,maiw_2014,junw_2014,julw_2014,
agow_2014,setw_2014,outw_2014,novw_2014,dezw_2014], columns=['Total'])
#2015
janw_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
fevw_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
marcw_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
abrw_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
maiw_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
junw_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
julw_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
agow_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
setw_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
outw_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
novw_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
dezw_2015 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2015';''').fetchall()[0][0]
months_w2015 = pd.DataFrame([janw_2015,fevw_2015,marcw_2015,abrw_2015,maiw_2015,junw_2015,julw_2015,
agow_2015,setw_2015,outw_2015,novw_2015,dezw_2015], columns=['Total'])
#2016
janw_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '01' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
fevw_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '02' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
marcw_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '03' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
abrw_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '04' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
maiw_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '05' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
junw_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '06' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
julw_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '07' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
agow_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '08' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
setw_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '09' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
outw_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '10' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
novw_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '11' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
dezw_2016 = c.execute('''SELECT count(timestamp) FROM ways WHERE strftime('%m',timestamp) = '12' AND
strftime('%Y',timestamp) = '2016';''').fetchall()[0][0]
months_w2016 = pd.DataFrame([janw_2016,fevw_2016,marcw_2016,abrw_2016,maiw_2016,junw_2016,julw_2016,
agow_2016,setw_2016,outw_2016,novw_2016,dezw_2016], columns=['Total'])
#SOMANDO OS VALORES OBTIDOS A PARTIR DOS NODES E WAYS
total_2007 = months_w2007 + months_2007
total_2008 = months_w2008 + months_2008
total_2009 = months_w2009 + months_2009
total_2010 = months_w2010 + months_2010
total_2011 = months_w2011 + months_2011
total_2012 = months_w2012 + months_2012
total_2013 = months_w2013 + months_2013
total_2014 = months_w2014 + months_2014
total_2015 = months_w2015 + months_2015
total_2016 = months_w2016 + months_2016
#Consolidar os dados das contribuições
consolidado_meses = pd.concat([total_2007,total_2008,total_2009,total_2010,total_2011,total_2012,
total_2013,total_2014,total_2015,total_2016])
lista_meses = [1,2,3,4,5,6,7,8,9,10,11,12] * 10
anos = [2007] * 12 + [2008] * 12 + [2009] * 12 + [2010] * 12 + [2011] * 12 + [2012] * 12 + [2013] * 12 + [2014] * 12 + [2015] * 12 + [2016] * 12
consolidado_meses['mes'] = lista_meses
consolidado_meses['ano'] = anos
#Uma breve visualização no dataframe criado
consolidado_meses.tail()
#Plotar um heatmap nos dados do dataframe consolidado_meses
sns.set(style="white", context='talk')
a1 = consolidado_meses.pivot('mes','ano','Total')
ax = plt.axes()
ax.set_title('Número de Constribuições por Período')
heatmap = sns.heatmap(a1, ax=ax, cmap="YlGnBu",annot=True, fmt="d", linewidths=.5, yticklabels=['Janeiro','Fevereiro','Março','Abril','Maio','Junho','Julho','Agosto','Setembro','Outubro','Novembro','Dezembro'])
Pelos dados consolidados no heatmap acima é possível constatar que a plataforma OpenStreetMap foi se consolidar em Brasília a partir de 2011, além do que em 2014 foi o ano que registrou o maior número de constribuições. O mapa também mostra que não existe um padrão para o mês de maior atividade ao longo do ano, todavia, os meses de junho a julho e de novembro a janeiro possuem significativos registros de atividade na plataforma, talvez esse padrão possa estar relacionado com os meses comuns a férias escolares onde as pessoas possuem mais disponilidade para usar a ferramenta.
Vamos determinar que são os usúarios mais ativos do site na cidade de Brasília.
most_active_users = c.execute("SELECT u.user, COUNT(*) as num FROM (SELECT user FROM nodes UNION ALL SELECT user FROM ways) u GROUP BY u.user ORDER BY num DESC LIMIT 10;").fetchall()
most_active_users
O usuário erickdeoliveiraleal é de longe o mais ativo dos usuários, totalizando um todo de 129890 submissões na plataforma. Como mostra o cálculo abaixo, isso representa aproxidamente 24% de todas as constribuições realizadas na cidade de Brasília!
most_active_users[0][1]/consolidado_meses['Total'].sum()
top1_pathway = pd.DataFrame(c.execute('''SELECT lat,lon from nodes where user ='erickdeoliveiraleal' ''').fetchall(), columns=['lat','lon'])
#Armazenar as latitudes e longitudes em listas
lat_top1 = list(top1_pathway['lat'])
lon_top1 = list(top1_pathway['lon'])
O mapa abaixo é possível visualizar a impressão digital de colaboração do usuário erickdeoliveiraleal ao longo da região de Brasília. Como se trata de heatmap, as regiões em tonalidade mais avermelhada são as de maior atividade. A região do plano piloto concentra as atividades do usuário, bem como as regiões administrativas à oeste do plano, que também são as mais populosas da grande Brasília.
from folium import plugins
bsb_hp = folium.Map(location=[-15.7665, -47.9471], tiles='stamentoner',zoom_start=9)
bsb_hp.add_children(plugins.HeatMap(zip(lat_top1, lon_top1), radius = 15))
O presente trabalho apresentou um trabalho de análise de dados com ciclo completo, desde a aquisição dos dados, tratamento destes, bem como alimentação de um banco de dados e posteriomente a exploração dos dados para obtenção de conclusões e insights a partir dos dados consolidados. Dessa maneira, o projeto proporcionou um aprendizado em situações de "vida-real" que um analista de dados pode encontrar em sua vida profissional.